Technologies for Managing Resource Allocation With a Hierarchical Model

ABSTRACT

Technologies for allocating resources of a set of managed nodes to workloads with a hierarchical model include an orchestrator server to receive resource allocation objective data. The orchestrator server is further to determine an initial assignment of a set of workloads among the managed nodes, receive telemetry data from the managed nodes, generate, from the telemetry data, a hierarchical model indicative of the resource utilization of each managed node, determine, with the hierarchical model, differences in resource utilization for each workload as a function of the managed node that performed the workload, determine, as a function of the telemetry data and the determined differences, an adjustment to the assignment of the workloads to increase an achievement of at least one of the resource allocation objectives without decreasing the achievement of any of the other resource allocation objectives, and apply the adjustment to the assignments of the workloads among the managed nodes as the workloads are performed.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of U.S. Provisional PatentApplication No. 62/365,969, filed Jul. 22, 2016, U.S. Provisional PatentApplication No. 62/376,859, filed Aug. 18, 2016, and U.S. ProvisionalPatent Application No. 62/427,268, filed Nov. 29, 2016.

BACKGROUND

In a typical cloud-based computing environment (e.g., a data center),multiple compute nodes may execute workloads (e.g., processes,applications, services, etc.) on behalf of customers. The workloads mayexhibit different resource usages based on the particular operationsthey perform. Furthermore, different compute nodes in the data centermay have different characteristics, either because the compute nodesinclude differing hardware and/or are located in difference placeswithin the data center. For example, some compute nodes may be equippedwith a more efficient processor, faster or more memory, or moreefficient fans, than other compute nodes in the same data center.Furthermore, different physical areas in a data center may have adifferent ambient temperatures that affect the internal temperatures ofthe compute nodes. Even when all of the compute nodes and temperatureswithin a data center are initially homogenous, differences may ariseover time, such as when components are upgraded, compute nodes arethrottled to extend their life expectancy, or other compute nodes areoverclocked to increase their performance. As such, a theoreticallyefficient distribution of workloads among the compute nodes may, inreality, be suboptimal if it is based on the assumption that the computenodes themselves are identical in their hardware capabilities and areplaced in locations having identical characteristics.

BRIEF DESCRIPTION OF THE DRAWINGS

The concepts described herein are illustrated by way of example and notby way of limitation in the accompanying figures. For simplicity andclarity of illustration, elements illustrated in the figures are notnecessarily drawn to scale. Where considered appropriate, referencelabels have been repeated among the figures to indicate corresponding oranalogous elements.

FIG. 1 is a diagram of a conceptual overview of a data center in whichone or more techniques described herein may be implemented according tovarious embodiments;

FIG. 2 is a diagram of an example embodiment of a logical configurationof a rack of the data center of FIG. 1;

FIG. 3 is a diagram of an example embodiment of another data center inwhich one or more techniques described herein may be implementedaccording to various embodiments;

FIG. 4 is a diagram of another example embodiment of a data center inwhich one or more techniques described herein may be implementedaccording to various embodiments;

FIG. 5 is a diagram of a connectivity scheme representative oflink-layer connectivity that may be established among various sleds ofthe data centers of FIGS. 1, 3, and 4;

FIG. 6 is a diagram of a rack architecture that may be representative ofan architecture of any particular one of the racks depicted in FIGS. 1-4according to some embodiments;

FIG. 7 is a diagram of an example embodiment of a sled that may be usedwith the rack architecture of FIG. 6;

FIG. 8 is a diagram of an example embodiment of a rack architecture toprovide support for sleds featuring expansion capabilities;

FIG. 9 is a diagram of an example embodiment of a rack implementedaccording to the rack architecture of FIG. 8;

FIG. 10 is a diagram of an example embodiment of a sled designed for usein conjunction with the rack of FIG. 9;

FIG. 11 is a diagram of an example embodiment of a data center in whichone or more techniques described herein may be implemented according tovarious embodiments;

FIG. 12 is a simplified block diagram of at least one embodiment of asystem for managing the assignment of workloads among a set of managednodes based on a hierarchical model of the managed nodes;

FIG. 13 is a simplified block diagram of at least one embodiment of anorchestrator server of the system of FIG. 12;

FIG. 14 is a simplified block diagram of at least one embodiment of anenvironment that may be established by the orchestrator server of FIGS.12 and 13;

FIGS. 15-17 are a simplified flow diagram of at least one embodiment ofa method for assigning workloads among a set of managed nodes based on ahierarchical model of the managed nodes that may be performed by theorchestrator server of FIGS. 12-14; and

FIG. 18 is a simplified diagram of at least one embodiment of ahierarchical model that may be established by the orchestrator server ofFIGS. 12 and 13.

DETAILED DESCRIPTION OF THE DRAWINGS

While the concepts of the present disclosure are susceptible to variousmodifications and alternative forms, specific embodiments thereof havebeen shown by way of example in the drawings and will be describedherein in detail. It should be understood, however, that there is nointent to limit the concepts of the present disclosure to the particularforms disclosed, but on the contrary, the intention is to cover allmodifications, equivalents, and alternatives consistent with the presentdisclosure and the appended claims.

References in the specification to “one embodiment,” “an embodiment,”“an illustrative embodiment,” etc., indicate that the embodimentdescribed may include a particular feature, structure, orcharacteristic, but every embodiment may or may not necessarily includethat particular feature, structure, or characteristic. Moreover, suchphrases are not necessarily referring to the same embodiment. Further,when a particular feature, structure, or characteristic is described inconnection with an embodiment, it is submitted that it is within theknowledge of one skilled in the art to effect such feature, structure,or characteristic in connection with other embodiments whether or notexplicitly described. Additionally, it should be appreciated that itemsincluded in a list in the form of “at least one A, B, and C” can mean(A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C).Similarly, items listed in the form of “at least one of A, B, or C” canmean (A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C).

The disclosed embodiments may be implemented, in some cases, inhardware, firmware, software, or any combination thereof. The disclosedembodiments may also be implemented as instructions carried by or storedon a transitory or non-transitory machine-readable (e.g.,computer-readable) storage medium, which may be read and executed by oneor more processors. A machine-readable storage medium may be embodied asany storage device, mechanism, or other physical structure for storingor transmitting information in a form readable by a machine (e.g., avolatile or non-volatile memory, a media disc, or other media device).

In the drawings, some structural or method features may be shown inspecific arrangements and/or orderings. However, it should beappreciated that such specific arrangements and/or orderings may not berequired. Rather, in some embodiments, such features may be arranged ina different manner and/or order than shown in the illustrative figures.Additionally, the inclusion of a structural or method feature in aparticular figure is not meant to imply that such feature is required inall embodiments and, in some embodiments, may not be included or may becombined with other features.

FIG. 1 illustrates a conceptual overview of a data center 100 that maygenerally be representative of a data center or other type of computingnetwork in/for which one or more techniques described herein may beimplemented according to various embodiments. As shown in FIG. 1, datacenter 100 may generally contain a plurality of racks, each of which mayhouse computing equipment comprising a respective set of physicalresources. In the particular non-limiting example depicted in FIG. 1,data center 100 contains four racks 102A to 102D, which house computingequipment comprising respective sets of physical resources (PCRs) 105Ato 105D. According to this example, a collective set of physicalresources 106 of data center 100 includes the various sets of physicalresources 105A to 105D that are distributed among racks 102A to 102D.Physical resources 106 may include resources of multiple types, suchas—for example—processors, co-processors, accelerators,field-programmable gate arrays (FPGAs), memory, and storage. Theembodiments are not limited to these examples.

The illustrative data center 100 differs from typical data centers inmany ways. For example, in the illustrative embodiment, the circuitboards (“sleds”) on which components such as CPUs, memory, and othercomponents are placed are designed for increased thermal performance Inparticular, in the illustrative embodiment, the sleds are shallower thantypical boards. In other words, the sleds are shorter from the front tothe back, where cooling fans are located. This decreases the length ofthe path that air must to travel across the components on the board.Further, the components on the sled are spaced further apart than intypical circuit boards, and the components are arranged to reduce oreliminate shadowing (i.e., one component in the air flow path of anothercomponent). In the illustrative embodiment, processing components suchas the processors are located on a top side of a sled while near memory,such as DIMMs, are located on a bottom side of the sled. As a result ofthe enhanced airflow provided by this design, the components may operateat higher frequencies and power levels than in typical systems, therebyincreasing performance. Furthermore, the sleds are configured to blindlymate with power and data communication cables in each rack 102A, 102B,102C, 102D, enhancing their ability to be quickly removed, upgraded,reinstalled, and/or replaced. Similarly, individual components locatedon the sleds, such as processors, accelerators, memory, and data storagedrives, are configured to be easily upgraded due to their increasedspacing from each other. In the illustrative embodiment, the componentsadditionally include hardware attestation features to prove theirauthenticity.

Furthermore, in the illustrative embodiment, the data center 100utilizes a single network architecture (“fabric”) that supports multipleother network architectures including Ethernet and Omni-Path. The sleds,in the illustrative embodiment, are coupled to switches via opticalfibers, which provide higher bandwidth and lower latency than typicaltwisted pair cabling (e.g., Category 5, Category 5e, Category 6, etc.).Due to the high bandwidth, low latency interconnections and networkarchitecture, the data center 100 may, in use, pool resources, such asmemory, accelerators (e.g., graphics accelerators, FPGAs, ASICs, etc.),and data storage drives that are physically disaggregated, and providethem to compute resources (e.g., processors) on an as needed basis,enabling the compute resources to access the pooled resources as if theywere local. The illustrative data center 100 additionally receives usageinformation for the various resources, predicts resource usage fordifferent types of workloads based on past resource usage, anddynamically reallocates the resources based on this information.

The racks 102A, 102B, 102C, 102D of the data center 100 may includephysical design features that facilitate the automation of a variety oftypes of maintenance tasks. For example, data center 100 may beimplemented using racks that are designed to be robotically-accessed,and to accept and house robotically-manipulatable resource sleds.Furthermore, in the illustrative embodiment, the racks 102A, 102B, 102C,102D include integrated power sources that receive a greater voltagethan is typical for power sources. The increased voltage enables thepower sources to provide additional power to the components on eachsled, enabling the components to operate at higher than typicalfrequencies.

FIG. 2 illustrates an exemplary logical configuration of a rack 202 ofthe data center 100. As shown in FIG. 2, rack 202 may generally house aplurality of sleds, each of which may comprise a respective set ofphysical resources. In the particular non-limiting example depicted inFIG. 2, rack 202 houses sleds 204-1 to 204-4 comprising respective setsof physical resources 205-1 to 205-4, each of which constitutes aportion of the collective set of physical resources 206 comprised inrack 202. With respect to FIG. 1, if rack 202 is representative of—forexample—rack 102A, then physical resources 206 may correspond to thephysical resources 105A comprised in rack 102A. In the context of thisexample, physical resources 105A may thus be made up of the respectivesets of physical resources, including physical storage resources 205-1,physical accelerator resources 205-2, physical memory resources 205-3,and physical compute resources 205-5 comprised in the sleds 204-1 to204-4 of rack 202. The embodiments are not limited to this example. Eachsled may contain a pool of each of the various types of physicalresources (e.g., compute, memory, accelerator, storage). By havingrobotically accessible and robotically manipulatable sleds comprisingdisaggregated resources, each type of resource can be upgradedindependently of each other and at their own optimized refresh rate.

FIG. 3 illustrates an example of a data center 300 that may generally berepresentative of one in/for which one or more techniques describedherein may be implemented according to various embodiments. In theparticular non-limiting example depicted in FIG. 3, data center 300comprises racks 302-1 to 302-32. In various embodiments, the racks ofdata center 300 may be arranged in such fashion as to define and/oraccommodate various access pathways. For example, as shown in FIG. 3,the racks of data center 300 may be arranged in such fashion as todefine and/or accommodate access pathways 311A, 311B, 311C, and 311D. Insome embodiments, the presence of such access pathways may generallyenable automated maintenance equipment, such as robotic maintenanceequipment, to physically access the computing equipment housed in thevarious racks of data center 300 and perform automated maintenance tasks(e.g., replace a failed sled, upgrade a sled). In various embodiments,the dimensions of access pathways 311A, 311B, 311C, and 311D, thedimensions of racks 302-1 to 302-32, and/or one or more other aspects ofthe physical layout of data center 300 may be selected to facilitatesuch automated operations. The embodiments are not limited in thiscontext.

FIG. 4 illustrates an example of a data center 400 that may generally berepresentative of one in/for which one or more techniques describedherein may be implemented according to various embodiments. As shown inFIG. 4, data center 400 may feature an optical fabric 412. Opticalfabric 412 may generally comprise a combination of optical signalingmedia (such as optical cabling) and optical switching infrastructure viawhich any particular sled in data center 400 can send signals to (andreceive signals from) each of the other sleds in data center 400. Thesignaling connectivity that optical fabric 412 provides to any givensled may include connectivity both to other sleds in a same rack andsleds in other racks. In the particular non-limiting example depicted inFIG. 4, data center 400 includes four racks 402A to 402D. Racks 402A to402D house respective pairs of sleds 404A-1 and 404A-2, 404B-1 and404B-2, 404C-1 and 404C-2, and 404D-1 and 404D-2. Thus, in this example,data center 400 comprises a total of eight sleds. Via optical fabric412, each such sled may possess signaling connectivity with each of theseven other sleds in data center 400. For example, via optical fabric412, sled 404A-1 in rack 402A may possess signaling connectivity withsled 404A-2 in rack 402A, as well as the six other sleds 404B-1, 404B-2,404C-1, 404C-2, 404D-1, and 404D-2 that are distributed among the otherracks 402B, 402C, and 402D of data center 400. The embodiments are notlimited to this example.

FIG. 5 illustrates an overview of a connectivity scheme 500 that maygenerally be representative of link-layer connectivity that may beestablished in some embodiments among the various sleds of a datacenter, such as any of example data centers 100, 300, and 400 of FIGS.1, 3, and 4. Connectivity scheme 500 may be implemented using an opticalfabric that features a dual-mode optical switching infrastructure 514.Dual-mode optical switching infrastructure 514 may generally comprise aswitching infrastructure that is capable of receiving communicationsaccording to multiple link-layer protocols via a same unified set ofoptical signaling media, and properly switching such communications. Invarious embodiments, dual-mode optical switching infrastructure 514 maybe implemented using one or more dual-mode optical switches 515. Invarious embodiments, dual-mode optical switches 515 may generallycomprise high-radix switches. In some embodiments, dual-mode opticalswitches 515 may comprise multi-ply switches, such as four-ply switches.In various embodiments, dual-mode optical switches 515 may featureintegrated silicon photonics that enable them to switch communicationswith significantly reduced latency in comparison to conventionalswitching devices. In some embodiments, dual-mode optical switches 515may constitute leaf switches 530 in a leaf-spine architectureadditionally including one or more dual-mode optical spine switches 520.

In various embodiments, dual-mode optical switches may be capable ofreceiving both Ethernet protocol communications carrying InternetProtocol (IP packets) and communications according to a second,high-performance computing (HPC) link-layer protocol (e.g., Intel'sOmni-Path Architecture's, Infiniband) via optical signaling media of anoptical fabric. As reflected in FIG. 5, with respect to any particularpair of sleds 504A and 504B possessing optical signaling connectivity tothe optical fabric, connectivity scheme 500 may thus provide support forlink-layer connectivity via both Ethernet links and HPC links. Thus,both Ethernet and HPC communications can be supported by a singlehigh-bandwidth, low-latency switch fabric. The embodiments are notlimited to this example.

FIG. 6 illustrates a general overview of a rack architecture 600 thatmay be representative of an architecture of any particular one of theracks depicted in FIGS. 1 to 4 according to some embodiments. Asreflected in FIG. 6, rack architecture 600 may generally feature aplurality of sled spaces into which sleds may be inserted, each of whichmay be robotically-accessible via a rack access region 601. In theparticular non-limiting example depicted in FIG. 6, rack architecture600 features five sled spaces 603-1 to 603-5. Sled spaces 603-1 to 603-5feature respective multi-purpose connector modules (MPCMs) 616-1 to616-5.

FIG. 7 illustrates an example of a sled 704 that may be representativeof a sled of such a type. As shown in FIG. 7, sled 704 may comprise aset of physical resources 705, as well as an MPCM 716 designed to couplewith a counterpart MPCM when sled 704 is inserted into a sled space suchas any of sled spaces 603-1 to 603-5 of FIG. 6. Sled 704 may alsofeature an expansion connector 717. Expansion connector 717 maygenerally comprise a socket, slot, or other type of connection elementthat is capable of accepting one or more types of expansion modules,such as an expansion sled 718. By coupling with a counterpart connectoron expansion sled 718, expansion connector 717 may provide physicalresources 705 with access to supplemental computing resources 705Bresiding on expansion sled 718. The embodiments are not limited in thiscontext.

FIG. 8 illustrates an example of a rack architecture 800 that may berepresentative of a rack architecture that may be implemented in orderto provide support for sleds featuring expansion capabilities, such assled 704 of FIG. 7. In the particular non-limiting example depicted inFIG. 8, rack architecture 800 includes seven sled spaces 803-1 to 803-7,which feature respective MPCMs 816-1 to 816-7. Sled spaces 803-1 to803-7 include respective primary regions 803-1A to 803-7A and respectiveexpansion regions 803-1B to 803-7B. With respect to each such sledspace, when the corresponding MPCM is coupled with a counterpart MPCM ofan inserted sled, the primary region may generally constitute a regionof the sled space that physically accommodates the inserted sled. Theexpansion region may generally constitute a region of the sled spacethat can physically accommodate an expansion module, such as expansionsled 718 of FIG. 7, in the event that the inserted sled is configuredwith such a module.

FIG. 9 illustrates an example of a rack 902 that may be representativeof a rack implemented according to rack architecture 800 of FIG. 8according to some embodiments. In the particular non-limiting exampledepicted in FIG. 9, rack 902 features seven sled spaces 903-1 to 903-7,which include respective primary regions 903-1A to 903-7A and respectiveexpansion regions 903-1B to 903-7B. In various embodiments, temperaturecontrol in rack 902 may be implemented using an air cooling system. Forexample, as reflected in FIG. 9, rack 902 may feature a plurality offans 919 that are generally arranged to provide air cooling within thevarious sled spaces 903-1 to 903-7. In some embodiments, the height ofthe sled space is greater than the conventional “1U” server height. Insuch embodiments, fans 919 may generally comprise relatively slow, largediameter cooling fans as compared to fans used in conventional rackconfigurations. Running larger diameter cooling fans at lower speeds mayincrease fan lifetime relative to smaller diameter cooling fans runningat higher speeds while still providing the same amount of cooling. Thesleds are physically shallower than conventional rack dimensions.Further, components are arranged on each sled to reduce thermalshadowing (i.e., not arranged serially in the direction of air flow). Asa result, the wider, shallower sleds allow for an increase in deviceperformance because the devices can be operated at a higher thermalenvelope (e.g., 250 W) due to improved cooling (i.e., no thermalshadowing, more space between devices, more room for larger heat sinks,etc.).

MPCMs 916-1 to 916-7 may be configured to provide inserted sleds withaccess to power sourced by respective power modules 920-1 to 920-7, eachof which may draw power from an external power source 921. In variousembodiments, external power source 921 may deliver alternating current(AC) power to rack 902, and power modules 920-1 to 920-7 may beconfigured to convert such AC power to direct current (DC) power to besourced to inserted sleds. In some embodiments, for example, powermodules 920-1 to 920-7 may be configured to convert 277-volt AC powerinto 12-volt DC power for provision to inserted sleds via respectiveMPCMs 916-1 to 916-7. The embodiments are not limited to this example.

MPCMs 916-1 to 916-7 may also be arranged to provide inserted sleds withoptical signaling connectivity to a dual-mode optical switchinginfrastructure 914, which may be the same as—or similar to—dual-modeoptical switching infrastructure 514 of FIG. 5. In various embodiments,optical connectors contained in MPCMs 916-1 to 916-7 may be designed tocouple with counterpart optical connectors contained in MPCMs ofinserted sleds to provide such sleds with optical signaling connectivityto dual-mode optical switching infrastructure 914 via respective lengthsof optical cabling 922-1 to 922-7. In some embodiments, each such lengthof optical cabling may extend from its corresponding MPCM to an opticalinterconnect loom 923 that is external to the sled spaces of rack 902.In various embodiments, optical interconnect loom 923 may be arranged topass through a support post or other type of load-bearing element ofrack 902. The embodiments are not limited in this context. Becauseinserted sleds connect to an optical switching infrastructure via MPCMs,the resources typically spent in manually configuring the rack cablingto accommodate a newly inserted sled can be saved.

FIG. 10 illustrates an example of a sled 1004 that may be representativeof a sled designed for use in conjunction with rack 902 of FIG. 9according to some embodiments. Sled 1004 may feature an MPCM 1016 thatcomprises an optical connector 1016A and a power connector 1016B, andthat is designed to couple with a counterpart MPCM of a sled space inconjunction with insertion of MPCM 1016 into that sled space. CouplingMPCM 1016 with such a counterpart MPCM may cause power connector 1016 tocouple with a power connector comprised in the counterpart MPCM. Thismay generally enable physical resources 1005 of sled 1004 to sourcepower from an external source, via power connector 1016 and powertransmission media 1024 that conductively couples power connector 1016to physical resources 1005.

Sled 1004 may also include dual-mode optical network interface circuitry1026. Dual-mode optical network interface circuitry 1026 may generallycomprise circuitry that is capable of communicating over opticalsignaling media according to each of multiple link-layer protocolssupported by dual-mode optical switching infrastructure 914 of FIG. 9.In some embodiments, dual-mode optical network interface circuitry 1026may be capable both of Ethernet protocol communications and ofcommunications according to a second, high-performance protocol. Invarious embodiments, dual-mode optical network interface circuitry 1026may include one or more optical transceiver modules 1027, each of whichmay be capable of transmitting and receiving optical signals over eachof one or more optical channels. The embodiments are not limited in thiscontext.

Coupling MPCM 1016 with a counterpart MPCM of a sled space in a givenrack may cause optical connector 1016A to couple with an opticalconnector comprised in the counterpart MPCM. This may generallyestablish optical connectivity between optical cabling of the sled anddual-mode optical network interface circuitry 1026, via each of a set ofoptical channels 1025. Dual-mode optical network interface circuitry1026 may communicate with the physical resources 1005 of sled 1004 viaelectrical signaling media 1028. In addition to the dimensions of thesleds and arrangement of components on the sleds to provide improvedcooling and enable operation at a relatively higher thermal envelope(e.g., 250 W), as described above with reference to FIG. 9, in someembodiments, a sled may include one or more additional features tofacilitate air cooling, such as a heatpipe and/or heat sinks arranged todissipate heat generated by physical resources 1005. It is worthy ofnote that although the example sled 1004 depicted in FIG. 10 does notfeature an expansion connector, any given sled that features the designelements of sled 1004 may also feature an expansion connector accordingto some embodiments. The embodiments are not limited in this context.

FIG. 11 illustrates an example of a data center 1100 that may generallybe representative of one in/for which one or more techniques describedherein may be implemented according to various embodiments. As reflectedin FIG. 11, a physical infrastructure management framework 1150A may beimplemented to facilitate management of a physical infrastructure 1100Aof data center 1100. In various embodiments, one function of physicalinfrastructure management framework 1150A may be to manage automatedmaintenance functions within data center 1100, such as the use ofrobotic maintenance equipment to service computing equipment withinphysical infrastructure 1100A. In some embodiments, physicalinfrastructure 1100A may feature an advanced telemetry system thatperforms telemetry reporting that is sufficiently robust to supportremote automated management of physical infrastructure 1100A. In variousembodiments, telemetry information provided by such an advancedtelemetry system may support features such as failureprediction/prevention capabilities and capacity planning capabilities.In some embodiments, physical infrastructure management framework 1150Amay also be configured to manage authentication of physicalinfrastructure components using hardware attestation techniques. Forexample, robots may verify the authenticity of components beforeinstallation by analyzing information collected from a radio frequencyidentification (RFID) tag associated with each component to beinstalled. The embodiments are not limited in this context.

As shown in FIG. 11, the physical infrastructure 1100A of data center1100 may comprise an optical fabric 1112, which may include a dual-modeoptical switching infrastructure 1114. Optical fabric 1112 and dual-modeoptical switching infrastructure 1114 may be the same as—or similarto—optical fabric 412 of FIG. 4 and dual-mode optical switchinginfrastructure 514 of FIG. 5, respectively, and may providehigh-bandwidth, low-latency, multi-protocol connectivity among sleds ofdata center 1100. As discussed above, with reference to FIG. 1, invarious embodiments, the availability of such connectivity may make itfeasible to disaggregate and dynamically pool resources such asaccelerators, memory, and storage. In some embodiments, for example, oneor more pooled accelerator sleds 1130 may be included among the physicalinfrastructure 1100A of data center 1100, each of which may comprise apool of accelerator resources—such as co-processors and/or FPGAs, forexample—that is globally accessible to other sleds via optical fabric1112 and dual-mode optical switching infrastructure 1114.

In another example, in various embodiments, one or more pooled storagesleds 1132 may be included among the physical infrastructure 1100A ofdata center 1100, each of which may comprise a pool of storage resourcesthat is available globally accessible to other sleds via optical fabric1112 and dual-mode optical switching infrastructure 1114. In someembodiments, such pooled storage sleds 1132 may comprise pools ofsolid-state storage devices such as solid-state drives (SSDs). Invarious embodiments, one or more high-performance processing sleds 1134may be included among the physical infrastructure 1100A of data center1100. In some embodiments, high-performance processing sleds 1134 maycomprise pools of high-performance processors, as well as coolingfeatures that enhance air cooling to yield a higher thermal envelope ofup to 250 W or more. In various embodiments, any given high-performanceprocessing sled 1134 may feature an expansion connector 1117 that canaccept a far memory expansion sled, such that the far memory that islocally available to that high-performance processing sled 1134 isdisaggregated from the processors and near memory comprised on thatsled. In some embodiments, such a high-performance processing sled 1134may be configured with far memory using an expansion sled that compriseslow-latency SSD storage. The optical infrastructure allows for computeresources on one sled to utilize remote accelerator/FPGA, memory, and/orSSD resources that are disaggregated on a sled located on the same rackor any other rack in the data center. The remote resources can belocated one switch jump away or two-switch jumps away in the spine-leafnetwork architecture described above with reference to FIG. 5. Theembodiments are not limited in this context.

In various embodiments, one or more layers of abstraction may be appliedto the physical resources of physical infrastructure 1100A in order todefine a virtual infrastructure, such as a software-definedinfrastructure 1100B. In some embodiments, virtual computing resources1136 of software-defined infrastructure 1100B may be allocated tosupport the provision of cloud services 1140. In various embodiments,particular sets of virtual computing resources 1136 may be grouped forprovision to cloud services 1140 in the form of SDI services 1138.Examples of cloud services 1140 may include—without limitation—softwareas a service (SaaS) services 1142, platform as a service (PaaS) services1144, and infrastructure as a service (IaaS) services 1146.

In some embodiments, management of software-defined infrastructure 1100Bmay be conducted using a virtual infrastructure management framework1150B. In various embodiments, virtual infrastructure managementframework 1150B may be designed to implement workload fingerprintingtechniques and/or machine-learning techniques in conjunction withmanaging allocation of virtual computing resources 1136 and/or SDIservices 1138 to cloud services 1140. In some embodiments, virtualinfrastructure management framework 1150B may use/consult telemetry datain conjunction with performing such resource allocation. In variousembodiments, an application/service management framework 1150C may beimplemented in order to provide QoS management capabilities for cloudservices 1140. The embodiments are not limited in this context.

As shown in FIG. 12, an illustrative system 1210 for assigning workloadsamong a set of managed nodes 1260 based on a hierarchical model of themanaged nodes includes an orchestrator server 1240 in communication withthe set of managed nodes 1260. Each managed node 1260 may be embodied asan assembly of resources (e.g., physical resources 206), such as computeresources (e.g., physical compute resources 205-4), storage resources(e.g., physical storage resources 205-1), accelerator resources (e.g.,physical accelerator resources 205-2), or other resources (e.g.,physical memory resources 205-3) from the same or different sleds (e.g.,the sleds 204-1, 204-2, 204-3, 204-4, etc.) or racks (e.g., one or moreof racks 302-1 through 302-32). Each managed node 1260 may beestablished, defined, or “spun up” by the orchestrator server 1240 atthe time a workload is to be assigned to the managed node 1260 or at anyother time, and may exist regardless of whether any workloads arepresently assigned to the managed node 1260. The system 1210 may beimplemented in accordance with the data centers 100, 300, 400, 1100described above with reference to FIGS. 1, 3, 4, and 11. In theillustrative embodiment, the set of managed nodes 1260 includes managednodes 1250, 1252, and 1254. While three managed nodes 1260 are shown inthe set, it should be understood that in other embodiments, the set mayinclude a different number of managed nodes 1260 (e.g., tens ofthousands). The system 1210 may be located in a data center and providestorage and compute services (e.g., cloud services) to a client device1220 that is in communication with the system 1210 through a network1230. The orchestrator server 1240 may support a cloud operatingenvironment, such as OpenStack, and the managed nodes 1250 may executeone or more applications or processes (i.e., workloads), such as invirtual machines or containers, on behalf of a user of the client device1220. As discussed in more detail herein, the orchestrator server 1240,in operation, is configured to receive resource allocation objectivedata indicative of thresholds or goals (“objectives”) to be satisfiedduring the execution of the workloads (e.g., a target power usage, atarget speed at which to execute the workloads, a target temperature ofthe managed nodes 1260, etc.). Additionally, the orchestrator server1240 is configured to assign workloads to the managed nodes 1260 andreceive telemetry data, which may be embodied as data indicative of theperformance and conditions of each managed node 1260 as the managednodes 1260 execute the workloads assigned to them.

Additionally, in the illustrative embodiment, the orchestrator server1240 is configured to organize the telemetry data into a hierarchicalmodel that is indicative of a relationship between the managed nodes(e.g., a spatial relationship such as the physical locations of themanaged nodes within the data center 1100 and/or a functionalrelationship, such as groupings of the managed nodes by the customersthe nodes provide services for, the types of functions typicallyperformed by the managed nodes, managed nodes that typically share orexchange workloads among each other, etc.). Based on differences in thephysical locations and hardware in the managed nodes, a given workloadmay exhibit different resource utilizations (e.g., cause a differentinternal temperature, use a different percentage of processor or memorycapacity) across different managed nodes 1260. The orchestrator server1240, in the illustrative embodiment, is configured to determine thedifferences based on the telemetry data stored in the hierarchical modeland factor the differences into a prediction of future resourceutilization of a workload if the workload is reassigned from one managednode to another managed node. By taking into account these differences,the orchestrator server 1240 may more accurately balance resourceutilization among the workloads and increase the achievement of one ormore of the resource allocation objectives without decreasing theachievement of any of the other resource allocation objectives. In theillustrative embodiment, the achievement of a resource allocationobjective may be measured, equal to, or otherwise defined as the degreeto which a measured value from one or more managed nodes 1260 satisfiesa target value associated with the resource allocation objective. Forexample, in the illustrative embodiment, increasing the achievement maybe performed by decreasing the error (e.g., difference) between themeasured value (e.g., an operating temperature of a managed node 1260)and the target value (e.g., a target operating temperature). Conversely,decreasing the achievement may be performed by increasing the error(e.g., difference) between the measured value and the target value.

Referring now to FIG. 13, the orchestrator server 1240 may be embodiedas any type of compute device capable of performing the functionsdescribed herein, including issuing a request to have cloud servicesperformed, receiving results of the cloud services, assigning workloadsto compute devices, analyzing telemetry data indicative of performanceand conditions (e.g., resource utilization, one or more temperatures,fan speeds, etc.) as the workloads are executed, generating ahierarchical model indicative of a relationship between the managednodes and differences in workload resource utilizations among themanaged nodes, and adjusting the assignments of the workloads to balanceresource utilization and manage the achievement of multiple resourceallocation objectives as the workloads are performed, using thehierarchical model. For example, the orchestrator server 1240 may beembodied as a computer, a distributed computing system, one or moresleds (e.g., the sleds 204-1, 204-2, 204-3, 204-4, etc.), a server(e.g., stand-alone, rack-mounted, blade, etc.), a multiprocessor system,a network appliance (e.g., physical or virtual), a desktop computer, aworkstation, a laptop computer, a notebook computer, a processor-basedsystem, or a network appliance. As shown in FIG. 13, the illustrativeorchestrator server 1240 includes a central processing unit (CPU) 1302,a main memory 1304, an input/output (I/O) subsystem 1306, communicationcircuitry 1308, and one or more data storage devices 1312. Of course, inother embodiments, the orchestrator server 1240 may include other oradditional components, such as those commonly found in a computer (e.g.,display, peripheral devices, etc.). Additionally, in some embodiments,one or more of the illustrative components may be incorporated in, orotherwise form a portion of, another component. For example, in someembodiments, the main memory 1304, or portions thereof, may beincorporated in the CPU 1302.

The CPU 1302 may be embodied as any type of processor capable ofperforming the functions described herein. The CPU 1302 may be embodiedas a single or multi-core processor(s), a microcontroller, or otherprocessor or processing/controlling circuit. In some embodiments, theCPU 1302 may be embodied as, include, or be coupled to a fieldprogrammable gate array (FPGA), an application specific integratedcircuit (ASIC), reconfigurable hardware or hardware circuitry, or otherspecialized hardware to facilitate performance of the functionsdescribed herein. As discussed above, the managed node 1260 may includeresources distributed across multiple sleds and in such embodiments, theCPU 1302 may include portions thereof located on the same sled ordifferent sled. Similarly, the main memory 1304 may be embodied as anytype of volatile (e.g., dynamic random access memory (DRAM), etc.) ornon-volatile memory or data storage capable of performing the functionsdescribed herein. In some embodiments, all or a portion of the mainmemory 1304 may be integrated into the CPU 1302. In operation, the mainmemory 1304 may store various software and data used during operationsuch as telemetry data, resource allocation objective data, workloadlabels, workload classifications, a hierarchical model, workloadadjustment data, operating systems, applications, programs, libraries,and drivers. As discussed above, the managed node 1260 may includeresources distributed across multiple sleds and in such embodiments, themain memory 1304 may include portions thereof located on the same sledor different sled.

The I/O subsystem 1306 may be embodied as circuitry and/or components tofacilitate input/output operations with the CPU 1302, the main memory1304, and other components of the orchestrator server 1240. For example,the I/O subsystem 1306 may be embodied as, or otherwise include, memorycontroller hubs, input/output control hubs, integrated sensor hubs,firmware devices, communication links (e.g., point-to-point links, buslinks, wires, cables, light guides, printed circuit board traces, etc.),and/or other components and subsystems to facilitate the input/outputoperations. In some embodiments, the I/O subsystem 1306 may form aportion of a system-on-a-chip (SoC) and be incorporated, along with oneor more of the CPU 1302, the main memory 1304, and other components ofthe orchestrator server 1240, on a single integrated circuit chip.

The communication circuitry 1308 may be embodied as any communicationcircuit, device, or collection thereof, capable of enablingcommunications over the network 1230 between the orchestrator server1240 and another compute device (e.g., the client device 1220, and/orthe managed nodes 1260). The communication circuitry 1308 may beconfigured to use any one or more communication technology (e.g., wiredor wireless communications) and associated protocols (e.g., Ethernet,Bluetooth®, Wi-Fi®, WiMAX, etc.) to effect such communication.

The illustrative communication circuitry 1308 includes a networkinterface controller (NIC) 1310, which may also be referred to as a hostfabric interface (HFI). The NIC 1310 may be embodied as one or moreadd-in-boards, daughtercards, network interface cards, controller chips,chipsets, or other devices that may be used by the orchestrator server1240 to connect with another compute device (e.g., the client device1220 and/or the managed nodes 1260). In some embodiments, the NIC 1310may be embodied as part of a system-on-a-chip (SoC) that includes one ormore processors, or included on a multichip package that also containsone or more processors. In some embodiments, the NIC 1310 may include alocal processor (not shown) and/or a local memory (not shown) that areboth local to the NIC 1310. In such embodiments, the local processor ofthe NIC 1310 may be capable of performing one or more of the functionsof the CPU 1302 described herein. Additionally or alternatively, in suchembodiments, the local memory of the NIC 1310 may be integrated into oneor more components of the orchestrator server 1240 at the board level,socket level, chip level, and/or other levels. As discussed above, themanaged node 1260 may include resources distributed across multiplesleds and in such embodiments, the communication circuitry 1308 mayinclude portions thereof located on the same sled or different sled.

The one or more illustrative data storage devices 1312, may be embodiedas any type of devices configured for short-term or long-term storage ofdata such as, for example, memory devices and circuits, memory cards,hard disk drives, solid-state drives, or other data storage devices.Each data storage device 1312 may include a system partition that storesdata and firmware code for the data storage device 1312. Each datastorage device 1312 may also include an operating system partition thatstores data files and executables for an operating system.

Additionally, the orchestrator server 1240 may include a display 1314.The display 1314 may be embodied as, or otherwise use, any suitabledisplay technology including, for example, a liquid crystal display(LCD), a light emitting diode (LED) display, a cathode ray tube (CRT)display, a plasma display, and/or other display usable in a computedevice. The display 1314 may include a touchscreen sensor that uses anysuitable touchscreen input technology to detect the user's tactileselection of information displayed on the display including, but notlimited to, resistive touchscreen sensors, capacitive touchscreensensors, surface acoustic wave (SAW) touchscreen sensors, infraredtouchscreen sensors, optical imaging touchscreen sensors, acoustictouchscreen sensors, and/or other type of touchscreen sensors.

Additionally or alternatively, the orchestrator server 1240 may includeone or more peripheral devices 1316. Such peripheral devices 1316 mayinclude any type of peripheral device commonly found in a compute devicesuch as speakers, a mouse, a keyboard, and/or other input/outputdevices, interface devices, and/or other peripheral devices.

The client device 1220 and the managed nodes 1260 may have componentssimilar to those described in FIG. 13. The description of thosecomponents of the orchestrator server 1240 is equally applicable to thedescription of components of the client device 1220 and the managednodes 1260 and is not repeated herein for clarity of the description.Further, it should be appreciated that any of the client device 1220 andthe managed nodes 1260 may include other components, sub-components, anddevices commonly found in a computing device, which are not discussedabove in reference to the orchestrator server 1240 and not discussedherein for clarity of the description.

As described above, the client device 1220, the orchestrator server 1240and the managed nodes 1260 are illustratively in communication via thenetwork 1230, which may be embodied as any type of wired or wirelesscommunication network, including global networks (e.g., the Internet),local area networks (LANs) or wide area networks (WANs), cellularnetworks (e.g., Global System for Mobile Communications (GSM), 3G, LongTerm Evolution (LTE), Worldwide Interoperability for Microwave Access(WiMAX), etc.), digital subscriber line (DSL) networks, cable networks(e.g., coaxial networks, fiber networks, etc.), or any combinationthereof.

Referring now to FIG. 14, in the illustrative embodiment, theorchestrator server 1240 may establish an environment 1400 duringoperation. The illustrative environment 1400 includes a networkcommunicator 1420, a telemetry monitor 1430, and a resource manager1440. Each of the components of the environment 1400 may be embodied ashardware, firmware, software, or a combination thereof. As such, in someembodiments, one or more of the components of the environment 1400 maybe embodied as circuitry or a collection of electrical devices (e.g.,network communicator circuitry 1420, telemetry monitor circuitry 1430,resource manager circuitry 1440, etc.). It should be appreciated that,in such embodiments, one or more of the network communicator circuitry1420, telemetry monitor circuitry 1430, or resource manager circuitry1440 may form a portion of one or more of the CPU 1302, the main memory1304, the I/O subsystem 1306, and/or other components of theorchestrator server 1240. In the illustrative embodiment, theenvironment 1400 includes telemetry data 1402 which may be embodied asdata indicative of the performance and conditions (e.g., resourceutilization, operating frequencies, power usage, one or moretemperatures, fan speeds, etc.) of each managed node 1260 as the managednodes 1260 execute the workloads assigned to them. Additionally, theillustrative environment 1400 includes resource allocation objectivedata 1404 indicative of user-defined thresholds or goals (“objectives”)to be satisfied during the execution of the workloads. In theillustrative embodiment, the objectives pertain to power consumption,life expectancy, heat production, and performance of the components ofthe managed nodes 1260. Further, the illustrative environment 1400includes workload labels 1406 which may be embodied as any identifiers(e.g., process numbers, executable file names, alphanumeric tags, etc.)that uniquely identify each workload executed by the managed nodes 1260.In addition, the illustrative environment 1400 includes workloadclassifications 1408 which may be embodied as any data indicative of thegeneral resource utilization tendencies of each workload (e.g.,processor intensive, memory intensive, network bandwidth intensive,etc.).

In the illustrative embodiment, the environment 1400 also includes ahierarchical model 1410, which may be embodied as any data indicative ofthe telemetry data 1402 associated with each managed node 1260, arelationship between the nodes, such as a spatial relationship or afunctional relationship, and differences between the resourceutilization of a workload when executed by one managed node as comparedto a reference (e.g., an average) resource utilization of the workload.For example, the spatial relationship may be indicative of the physicallocation of each managed node 1260 (e.g., a location in the data center1100), such as an access pathway number, rack number, and sled spaceindex, or a set of spatial coordinates (e.g., latitude, longitude, andaltitude, or coordinates relative to a known location in the datacenter). A functional relationship may be embodied as an indication thata particular managed node 1260 is a member of a group of managed nodes1260 that perform a related set of functions (e.g., functions for aparticular customer or class of customers, functions that have similarresource utilization profiles, etc.). The differences between theresource utilization of a workload executed by one managed node 1260 anda reference resource utilization indicative of resource utilization ofthe workload on a reference managed node 1260 may be embodied as weights(e.g., coefficients) and may be a result of the idiosyncrasies of thehardware components of the managed node 1260 and/or its relationship tothe other managed nodes 1260 (e.g., the managed node 1260 is located ina portion of the data center 1100 that typically has an above-averageambient temperature, etc.).

Further, the illustrative environment 1400 includes workload adjustmentdata 1412 which may be embodied as any data indicative of reassignments(e.g., live migrations) of one or more workloads from one managed node1260 to another managed node 1260 and/or adjustments to settings forcomponents within each managed node 1260, such as target power usage ofthe components, processor capacity (e.g., a number of cores to be used,a clock speed, a percentage of available processor cycles, etc.)available to one or more workloads, memory resource capacity (e.g.,amount of memory to be used and/or frequency of memory accesses tovolatile memory and/or non-volatile memory) available to one or moreworkloads, communication circuitry capacity (e.g., network bandwidth)available to one or more workloads, and/or target operating temperaturesand fan speeds.

In the illustrative environment 1400, the network communicator 1420,which may be embodied as hardware, firmware, software, virtualizedhardware, emulated architecture, and/or a combination thereof asdiscussed above, is configured to facilitate inbound and outboundnetwork communications (e.g., network traffic, network packets, networkflows, etc.) to and from the orchestrator server 1240, respectively. Todo so, the network communicator 1420 is configured to receive andprocess data packets from one system or computing device (e.g., theclient device 1220) and to prepare and send data packets to anothercomputing device or system (e.g., the managed nodes 1260). Accordingly,in some embodiments, at least a portion of the functionality of thenetwork communicator 1420 may be performed by the communicationcircuitry 1308, and, in the illustrative embodiment, by the NIC 1310.

The telemetry monitor 1430, which may be embodied as hardware, firmware,software, virtualized hardware, emulated architecture, and/or acombination thereof as discussed above, is configured to collect thetelemetry data 1402 from the managed nodes 1260 as the managed nodes1260 execute the workloads assigned to them. The telemetry monitor 1430may actively poll each of the managed nodes 1260 for updated telemetrydata 1402 on an ongoing basis or may passively receive telemetry data1402 from the managed nodes 1260, such as by listening on a particularnetwork port for updated telemetry data 1402. The telemetry monitor 1430may further parse and categorize the telemetry data 1402, such as byseparating the telemetry data 1402 into an individual file or data setfor each managed node 1260.

The resource manager 1440, which may be embodied as hardware, firmware,software, virtualized hardware, emulated architecture, and/or acombination thereof, is configured to generate data analytics from thetelemetry data 1402, identify the workloads, classify the workloads,generate the hierarchical model 1410, determine whether the assignmentof the workloads can be adjusted in view of the differences in resourceutilizations of workload across different managed nodes 1260 to improvethe achievement of at least one resource allocation objective (e.g.,increase resource utilization) without adversely affecting theachievement of any other resource allocation objectives, and apply theadjustments. To do so, the resource manager 1440 includes a hierarchicalmodeler 1442 that includes a workload labeler 1444, a workloadclassifier 1446, a workload behavior predictor 1448, and amulti-objective analyzer 1450. The hierarchical modeler 1442, in theillustrative embodiment, is configured to generate the hierarchicalmodel 1410 and coordinate the functions of the workload labeler 1444,the workload classifier 1446, the workload behavior predictor 1448, andthe multi-objective analyzer 1450 to more accurately determine and applyadjustments to the assignments to the workloads using informationindicative of the differences in how a workload affects the resourceutilization on one managed node 1260 versus another managed node 1260that may be located in a different place (e.g., a cooler section of thedata center 1100) or include different hardware components (e.g., morememory, a more efficient processor, etc). The workload labeler 1444, inthe illustrative embodiment, is configured to assign a workload label1406 to each workload presently performed or scheduled to be performedby one or more of the managed nodes 1260. The workload labeler 1444 maygenerate the workload label 1406 as a function of an executable name ofthe workload, a hash of all or a portion of the code of the workload, orbased on any other method to uniquely identify each workload. Theworkload classifier 1446, in the illustrative embodiment, is configuredto categorize each labeled workload based on the average resourceutilization of each workload (e.g., generally utilizes 65% of processorcapacity, generally utilizes 40% of memory capacity, etc.).

The workload behavior predictor 1448, in the illustrative embodiment, isconfigured to analyze the telemetry data 1402 to identify differentphases of resource utilization within the telemetry data 1402 for eachworkload. Each resource utilization phase may be embodied as a period oftime in which the resource utilization of a one or more resources of amanaged node satisfies a predefined threshold. For example, autilization of at least 85% of the available processor capacity may beindicative of a high processor utilization phase, and a utilization ofat least 85% of the memory capacity may be indicative of a high memoryutilization phase. In the illustrative embodiment, the workload behaviorpredictor 1448 is further to identify patterns in the resourceutilization phases of the workloads (e.g., a high processor utilizationphase, followed by a high memory utilization phase, followed by a phaseof low resource utilization, which is then followed by the highprocessor utilization phase again). The workload behavior predictor 1448may be configured to utilize the identifications of the resourceutilization phase patterns, determine a present resource utilizationphase of a given workload, predict the next resource utilization phasebased on the patterns, and determine an amount of remaining time untilthe workload transitions to the next resource utilization phase. Themulti-objective analyzer 1450, in the illustrative embodiment, isconfigured to balance the resource allocation objectives defined in theresource allocation objective data 1404, determine, based on thetelemetry data 1402, whether the present allocation of the resources inthe managed nodes 1260 is Pareto-efficient (e.g., that no adjustment canbe made without decreasing the achievement of one or more other resourceallocation objectives), and if not, determine an adjustment thatprovides a Pareto improvement (e.g., an increase in the achievement ofat least one of the objectives without decreasing the achievement of anyof the other objectives). To do so, in the illustrative embodiment, themulti-objective analyzer 1450 includes a workload placer 1452 and a nodesettings adjuster 1454.

In the illustrative embodiment, the multi-objective analyzer 1450 isconfigured to determine, as a function of the telemetry data 1402,including the present resource utilizations of the workloads and thepredicted behavior of the workloads, whether an adjustment can be madeto any of the assignments of the workloads and/or the settings of thecomponents of the managed nodes 1260 to increase the achievement of oneor more of the resource allocation objectives without decreasing theachievement of the other resource allocation objectives. In theillustrative embodiment, the multi-objective analyzer 1450 may do so bymodeling or simulating the set of managed nodes 1260, includingsimulating the differences (e.g., weights), represented in thehierarchical model 1410, indicative of how a workload affects theresource utilization on one managed node 1260 versus another managednode 1260, to determine the power consumption, heat generation, computecapacity, and other factors in response to various adjustments to theassignments of workloads and/or the settings of components within themanaged nodes 1260, define a Pareto frontier indicative of a set ofresource allocations that are all Pareto-efficient, determine whetherthe present resource allocation is already on the Pareto frontier, andif not, what adjustment to the allocations would reach the Paretofrontier. The multi-objective analyzer 1450 may determine the Paretofrontier, P(Y), as follows:

ƒ: R ^(n) →R ^(m)  (Equation 1)

In the above equation, ƒ is a function of the set of managed nodes 1260,modeled by the multi-objective analyzer 1450, that is indicative of theresponse of the managed nodes 1260 to adjustments to the assignments ofworkloads. R^(n) is a metric space of possible allocations (i.e.,assignments of workloads) and R^(m) represents a set of criterionvectors. In the following equation, X is a compact set of feasibledecisions in the metric space, R^(n), and Y is the feasible set ofcriterion vectors in R^(m), as follows:

Y={yεR ^(m) : y=ƒ(x),xεX}  (Equation 2)

Furthermore, a point y″ defined in Equation 3 below dominates anotherpoint y′ defined in Equation 4.

y″εR ^(m)  (Equation 3)

y′εR ^(m)  (Equation 4)

As such, the Pareto frontier may be represented as follows:

P(Y)={y′εY:{y″εY: y″>y′,y″≠y′}=Ø}  (Equation 5)

The workload placer 1452, in the illustrative embodiment, is configuredto initially assign workloads to the various managed nodes 1260 andreassign the workloads among the managed nodes 1260 to provide a Paretoimprovement (e.g., an adjustment that improves the achievement of atleast one resource allocation objective without decreasing theachievement of any other resource allocation objectives). In doing so,the workload placer 1452 may determine reassignments of workloads amongthe managed nodes 1260 and/or time offsets to apply to workloads toalign the timing of the resource utilization phases identified by theworkload behavior predictor 1448. Similarly, the node settings adjuster1454, in the illustrative embodiment, is configured to apply one or moreadjustments to the settings within the managed nodes 1260 to provide orrestrict the resources available to the workloads in accordance with thedetermined Pareto improvement. The settings may be associated with theoperating system and/or the firmware or drivers of the components of themanaged nodes 1260.

It should be appreciated that each of the hierarchical modeler 1442,workload labeler 1444, the workload classifier 1446, the workloadbehavior predictor 1448, the multi-objective analyzer 1450, the workloadplacer 1452, and the node settings adjuster 1454 may be separatelyembodied as hardware, firmware, software, virtualized hardware, emulatedarchitecture, and/or a combination thereof. For example, thehierarchical modeler 1442 may be embodied as a hardware component, whilethe workload labeler 1444, the workload classifier 1446, the workloadbehavior predictor 1448, the multi-objective analyzer 1450, the workloadplacer 1452, and the node settings adjuster 1454 are embodied asvirtualized hardware components or as some other combination ofhardware, firmware, software, virtualized hardware, emulatedarchitecture, and/or a combination thereof.

Referring now to FIG. 15, in use, the orchestrator server 1240 mayexecute a method 1500 for managing the assignment of workloads among themanaged nodes 1260 based on the hierarchical model 1410. The method 1500begins with block 1502, in which the orchestrator server 1240 determineswhether to manage workloads. In the illustrative embodiment, theorchestrator server 1240 determines to manage workloads if theorchestrator server 1240 is powered on, in communication with themanaged nodes 1260, and has received at least one request from theclient device 1220 to provide cloud services (i.e., to perform one ormore workloads). In other embodiments, the orchestrator server 1240 maydetermine whether to manage workloads based on other factors.Regardless, in response to a determination to manage workloads, in theillustrative embodiment, the method 1500 advances to block 1504 in whichthe orchestrator server 1240 receives resource allocation objective data(e.g., the resource allocation objective data 1404). In doing so, theorchestrator server 1240 may receive the resource allocation objectivedata 1404 from a user (e.g., an administrator) through a graphical userinterface (not shown), from a configuration file, or from anothersource. In receiving the resource allocation objective data 1404, theorchestrator server 1240 may receive power consumption objective dataindicative of a target power usage or threshold amount of power usage ofthe managed nodes 1260 as they execute the workloads, as indicated inblock 1506. The orchestrator server 1240, in the illustrativeembodiment, may also receive performance objective data, indicative of atarget speed at which workloads are to be executed (e.g., a processorclock speed, a memory clock speed, I/O operations per second, a targettime period in which to complete execution of a workload, etc.), asindicated in block 1508. Additionally or alternatively, the orchestratorserver 1240 may receive reliability objective data indicative of atarget life cycle of one or more of the managed nodes 1260 or componentstherein (e.g., a target life cycle of a data storage device, a targetlife cycle of a cooling fan, etc.), as indicated in block 1510. Asindicated in block 1512, the orchestrator server 1240 may also receivethermal objective data indicative of one or more target temperatures inthe managed nodes 1260.

After receiving the resource allocation objective data 1404, in theillustrative embodiment, the method 1500 advances to block 1514 in whichthe orchestrator server 1240 assigns initial workloads to the managednodes 1260. In the illustrative embodiment, the orchestrator server 1240has not received telemetry data 1402 that would inform a decision as towhere the workloads are to be assigned among the managed nodes 1260. Assuch, the orchestrator server 1240 may assign the workloads to themanaged nodes 1260 based on any suitable method, such as by assigningeach workload to the first available managed node that is idle (i.e., isnot presently executing a workload), randomly assigning the workloads,or by any other method.

Having assigned the workloads, the method 1500 advances to block 1516 inwhich the orchestrator server 1240 receives telemetry data 1402 from themanaged nodes 1260 as the workloads are performed (i.e., executed). Asdescribed herein, depending on the idiosyncrasies of the components andlocation of each managed node 1260, the same workload may affect onemanaged node 1260 differently than it would affect another managed node1260. By collecting the telemetry data 1402, the orchestrator server1240 may later determine the differences in how the same workloadaffects managed nodes 1260 differently, as described herein. Inreceiving the telemetry data 1402, the orchestrator server 1240 mayreceive power consumption data indicative of an amount of power (e.g.,Watts) consumed by each managed node 1260, as indicated in block 1518.The orchestrator server 1240 may also receive performance data from eachmanaged node 1260 indicative of a speed at which the workloads areperformed, as indicated in block 1520. As such, the performance data maybe embodied as an amount of time consumed to complete a function or taskassociated with a workload. Further, as indicated in block 1522, theorchestrator server 1240 may receive processor utilization dataindicative of an amount of processor usage consumed by each workloadperformed by each managed node 1260. Moreover, as indicated in block1524, the orchestrator server 1240 may receive memory utilization datafor each managed node 1260. The memory utilization data may be embodiedas Intel Cache Allocation Technology (CAT) data, Intel Cache MonitoringTechnology (CMT) data, Intel Memory Bandwidth Monitoring (MBM) data,and/or other data indicative of an amount or frequency of memory use byeach workload performed by each managed node 1260.

In receiving the memory utilization data, the orchestrator server 1240may receive cache utilization data indicative of a frequency of cacheaccesses associated with a workload and/or cache miss rate information,as indicated in block 1526. Additionally or alternatively, as indicatedin block 1528, in receiving the memory utilization data, theorchestrator server 1240 may receive volatile memory utilization dataindicative of an amount volatile memory (e.g., the main memory 1304)used, a frequency of accesses to the volatile memory, page fault data,and/or other information indicative of the utilization of the volatilememory within each managed node 1260. The orchestrator server 1240 mayadditionally or alternatively receive non-volatile memory utilizationdata indicative of the amount of data stored and/or retrieved from thedata storage devices 1312 and/or a frequency at which each workloadissues write requests and/or read requests to the data storage devices1312 in each managed node 1260, as indicated in block 1530.

In receiving the telemetry data 1402, the orchestrator server 1240 mayadditionally or alternatively receive network utilization dataindicative of an amount of network bandwidth (e.g., capacity of thecommunication circuitry) used by each workload performed by each managednode 1260, as indicated in block 1532. The orchestrator server 1240 mayadditionally receive temperature data from each managed node 1260indicative of one or more temperatures within the managed nodes 1260, asindicated in block 1534. After receiving the telemetry data 1402, themethod 1500 advances to block 1536 of FIG. 16, in which the orchestratorserver 1240 generates the hierarchical model 1410 from the telemetrydata 1402 as the workloads are being performed.

Referring now to FIG. 16, in generating the hierarchical model 1410, theorchestrator server 1240 identifies each managed node 1260 associatedwith the telemetry data 1402, as indicated in block 1538. In theillustrative embodiment, the orchestrator server 1240 may associate anidentifier (e.g., an Internet Protocol (IP) address, a media accesscontrol (MAC) address, or other unique identifier) included in thetelemetry data 1402 with each corresponding managed node 1260.Additionally, in the illustrative embodiment, the orchestrator server1240 stores the telemetry data 1402 in the hierarchical model 1410 in aformat indicative of a relationship between the managed nodes 1260. Indoing so, as indicated in block 1542, the orchestrator server 1240 maystore the telemetry data 1402 in the hierarchical model 1410 in a formatindicative of spatial locations of the managed nodes 1260. For example,and with reference to the diagram 1800 shown in FIG. 18, thehierarchical model 1410 may be organized with the orchestrator server1240 as a root 1810, with access pathways 1820, 1822 associated with theroot, racks 1830, 1832, associated with the access pathways 1820, 1822,and sled space indexes 1840, 1842 associated with the racks 1830, 1832.Alternatively, the spatial relationship may include the data center 1100as a root and iteratively smaller subsections of the data center 1100connected in a hierarchy. In the illustrative embodiment, theorchestrator server 1240 stores the telemetry data 1402 for each managednode 1260 in association with the location of the managed node 1260 inthe hierarchy (e.g., stored in association with the entry indicative ofthe sled space index of the corresponding managed node 1260, an entryindicative of spatial coordinates where the corresponding managed node1260 is physically located, etc.). The orchestrator server 1240 mayalternatively store the telemetry data 1402 in the hierarchical model1410 in a format indicative of a membership of each managed node 1260 ina corresponding functional group, as indicated in block 1544. Eachfunctional group may be embodied as a set of managed nodes 1260associated with a unique identifier (e.g., an alphanumeric code), thatperform similar functions (e.g., encryption, compression, etc.), performdifferent functions for the same customer or class of customer (e.g., acustomer who paid for a particular quality of service (QoS)), or basedon another functional relationship. The storage of the telemetry data1402 in the format indicative of the relationship between the managednodes 1260 is the beginning of the formation of the hierarchical model1410. In the illustrative embodiment, the orchestrator server 1240subsequently generates data analytics which enables the orchestratorserver 1240 to add additional data (e.g., weights 1850 shown in FIG. 18)to the hierarchical model 1410 indicative of the differences in how eachmanaged node 1260 is affected by workloads, as described herein.

As indicated in block 1546, the orchestrator server 1240, in theillustrative embodiment, generates data analytics as the workloads areperformed. In generating the data analytics, the orchestrator server1240, in the illustrative embodiment, generates profiles of theworkloads, as indicated in block 1548. In doing so, in the illustrativeembodiment, the orchestrator server 1240 generates the labels 1406 forthe workloads to uniquely identify each workload, as indicated in block1550. Additionally, in the illustrative embodiment, the orchestratorserver 1240 generates the classifications 1408 of the workloads, asindicated in block 1552. Further, in the illustrative embodiment, theorchestrator server 1240 identifies the patterns in the resourceutilization phases for each of the workloads, as indicated in block1554. In doing so, the orchestrator server 1240 may determine that aparticular workload experiences a phase of high processor utilizationand low memory utilization that is typically followed by a phase of lowprocessor utilization and high memory utilization, and that anotherworkload experiences similar phases, but at a different frequency or ata time offset from the other workload.

Additionally, as indicated in block 1556, the orchestrator server 1240may predict future resource utilization phases of the workloads, such asby comparing a present resource utilization of each workload to theidentified patterns in the resource utilization phases to determine thepresent phase of each workload, and then identifying the upcoming phasesof the workloads from the patterns. In doing so, as indicated in block1558, the orchestrator server 1240 may predict future contention forresources. For example, the orchestrator server 1240 may identify twoworkloads executed concurrently by the same managed node 1260 that arepredicted to both enter high processor utilization phases, causingcontention for that resource of the managed node 1260. Similarly, theorchestrator server 1240 may identify workloads executed by a particularmanaged node 1260 that are predicted to concurrently enter phases ofhigh memory utilization or high network bandwidth utilization, causingcontention for those resources. In doing so, the orchestrator server1240 may determine whether, at the heightened resource utilization, theavailable capacity of the resource subject to contention will be lessthan the amount requested by the workloads executed by that managed node1260, and if so, store an indicator to potentially adjust the assignmentof those workloads among the managed nodes 1260. In block 1560, theorchestrator server 1240 stores the data analytics in association witheach of the managed nodes 1260 represented in the hierarchical model1410.

In block 1562, the orchestrator server 1240, in the illustrativeembodiment, determines differences in the data analytics for eachworkload as a function of the managed node 1260 that performed theworkload. For example, while a particular labeled workload, or workloadsthat have the same classification 1408, should exhibit the same resourceutilization on any given managed node 1260 if the managed nodes 1260were identical and located in exactly the same place, the data analyticsmay indicate that the resource utilizations actually differed (e.g., dueto differences in the components of the managed nodes 1260 and/or theirrelationship to other managed nodes 1260, such as being near othermanaged nodes that generate an above-average amount of heat). Theorchestrator server 1240, in the illustrative embodiment, may determinea reference amount of resource utilization (e.g., an average across themanaged nodes 1260 that performed the workload) for both theclassification (e.g., the general resource utilization) of the workloadand each resource utilization phase of the workload, and then determinedifferences from those reference amounts based on the telemetry datareported by each managed node 1260 that executed the workload. Asindicated in block 1564, the orchestrator server 1240 may then store thedetermined differences as weights 1850 (e.g., coefficients) in thehierarchical model 1410 in association with each managed node 1260represented therein. As an example, one managed node 1260 located in aparticularly hot area of the data center 1100 (e.g., near managed nodesthat perform high processor utilization workloads) may report a higherfan utilization than another managed node 1260 located in a relativelycooler section of the data center 1100 for the same workload (e.g.,having the same workload label 1406) or same workload classification(e.g., having the same workload classification 1408). In such anexample, the orchestrator server 1240 may store a fan utilization weightof 1.2 for the managed node 1260 in the hot area of the data center anda fan utilization weight of 0.8 for the other managed node 1260 in thecooler area.

After generating the data analytics, the method 1500 advances to block1566, in which the orchestrator server 1240 determines whether to adjustthe workload assignments. In doing so, the orchestrator server 1240, inthe illustrative embodiment, determines whether the workload assignmentsare Pareto-efficient, as indicated in block 1568. In the illustrativeembodiment, in determining whether the workload assignments arePareto-efficient the orchestrator server 1240 determines whether anadjustment can be made to the workload assignments to increase theachievement of a resource allocation objective (e.g., a targetperformance), without decreasing the achievement of any other resourceallocation objective (e.g., a target temperature, a target powerconsumption, etc.). As described above, the orchestrator server 1240 maydetermine the Pareto frontier based on a model of the reactions of themanaged nodes 1260 to adjustments to the assignment of workloads, anddetermine whether the present state of the allocation of resources isalready on the Pareto frontier. Afterwards, the method 1500 advances toblock 1570 of FIG. 17, in which the orchestrator server 1240 determinesthe subsequent steps based on whether the orchestrator server 1240determined to adjust the workload assignments.

Referring now to FIG. 17, if the orchestrator server 1240 determines notto adjust the workload assignments, the method 1500 loops back to block1516 of FIG. 15, in which the orchestrator server 1240 again receivestelemetry data from the managed nodes 1260 as the workloads continue tobe performed. Otherwise, the method 1500 advances to block 1572, inwhich the orchestrator server 1240 determines adjustments to theworkload assignments.

Referring now to FIG. 17, the illustrative orchestrator server 1240, inblock 1572, determines, as a function of the data analytics and theresource allocation objective data 1404, adjustments to the workloadassignments as the workloads are performed. In doing so, theorchestrator server 1240 determines one or more adjustments to improvethe achievement of at least one of the resource allocation objectiveswithout decreasing the achievement of any of the other resourceallocation objectives, as indicated in block 1574. As indicated in block1576, the orchestrator server 1240 determines workload reassignments, tomove at least one workload from one managed node 1260 to another managednode 1260. In doing so, the orchestrator server 1240 predicts, asfunction of the node-specific weights determined in block 1564 of FIG.16, a change in the resource utilization of a workload to be reassignedfrom one managed node 1260 (e.g., a source managed node 1260) to anothermanaged node 1260 (e.g., a destination managed node 1260). For example,if the hierarchical model 1410 indicates that the source managed node1260 has weight of 1.6 associated with processor utilization (e.g.,because the processor in the source managed node 1260 is less efficientthan an average processor in the managed nodes 1260), and thehierarchical model 1410 indicates that the destination managed node 1260has a weight of 1 associated with processor utilization (e.g., becausethe processor in the destination managed node 1260 has an averageefficiency compared to the other managed nodes 1260), then theorchestrator server 1240 may predict that the processor utilization forthe workload will decrease by approximately 38% when the workload isreassigned to the destination managed node 1260.

The orchestrator server 1240 may additionally determine node-specificadjustments, as indicated in block 1580. The node-specific adjustmentsmay be embodied as changes to settings within one or more of the managednodes 1260, such as in the operating system, the drivers, and/or thefirmware of components (e.g., the CPU 1302, the memory 1304, thecommunication circuitry 1308, the one or more data storage devices 1312,etc.) to improve resource utilization. As such, in the illustrativeembodiment, in determining the node-specific adjustments, theorchestrator server 1240 may determine processor throttle adjustments,such as clock speed and/or processor affinity for one or more workloads,as indicated in block 1582. Additionally or alternatively, theorchestrator server 1240 may determine memory usage adjustments, such asallocations of volatile memory (e.g., the memory 1304) and/or datastorage capacity (e.g., capacity of the one or more data storage devices1312), memory bus speeds, and/or other memory-related settings, asindicated in block 1584. Additionally or alternatively, the orchestratorserver 1240 may determine one or more fan speed adjustments to increaseor decrease the cooling within the managed node 1260, as indicated inblock 1586. Additionally or alternatively, the orchestrator server 1240may determine network bandwidth adjustments, such as an availablebandwidth of the communication circuitry 1308 to be allocated to eachworkload in the managed node 1260, as indicated in block 1588.

After determining the adjustments, the method advances to block 1590 inwhich the orchestrator server 1240 applies the determined adjustments.In doing so, the orchestrator server 1240 may issue one or more requeststo perform a live migration of a workload between two managed nodes 1260(i.e., a workload reassignment), as indicated in block 1592. In theillustrative embodiment, the migration is live because, rather thanwaiting until the workloads have been completed to analyze the telemetrydata 1402, the orchestrator server 1240 collects and analyzes thetelemetry data 1402, and makes adjustments online (i.e., as theworkloads are being performed), as described above. The orchestratorserver 1240 may also issue requests to suspend or resume execution ofworkloads, as indicated in block 1594. In doing so, the orchestratorserver 1240 may suspend the execution of a workload until a determinedtime offset has elapsed, then resume execution, thereby shifting aresource utilization phase in time, to align with a complementaryresource utilization phase of another workload executed on the samemanaged node 1260. In the illustrative embodiment, complementaryresource utilization phases may be embodied as two or more resourceutilization phases of different workloads that primarily use differentresources of a managed node 1260, such that concurrent execution of thephases does not cause resource contention. For example, a high processorand low memory utilization phase may be complementary with a lowprocessor and high memory utilization phase. In applying the determinedadjustments, as indicated in block 1596, the orchestrator server 1240may also issue one or more requests to one or more of the managed nodes1260 to apply the node-specific adjustments described above withreference to block 1580. After applying the adjustments, the method 1500loops back to block 1516 of FIG. 15, in which the orchestrator server1240 receives additional telemetry data 1402 from the managed nodes1260.

EXAMPLES

Illustrative examples of the technologies disclosed herein are providedbelow. An embodiment of the technologies may include any one or more,and any combination of, the examples described below.

Example 1 includes an orchestrator server to allocate resources of a setof managed nodes to workloads with a hierarchical model, theorchestrator server comprising one or more processors; one or morememory devices having stored therein a plurality of instructions that,when executed by the one or more processors, cause the orchestratorserver to receive resource allocation objective data indicative ofmultiple resource allocation objectives to be satisfied; determine aninitial assignment of a set of workloads among the managed nodes;receive telemetry data from the managed nodes, wherein the telemetrydata is indicative of resource utilization by the managed nodes as theworkloads are performed; generate, as a function of the telemetry data,a hierarchical model indicative of the resource utilization of eachmanaged node and a relationship between the managed nodes; determine,with the hierarchical model, differences in resource utilization foreach workload as a function of the managed node that performed theworkload; determine, as a function of the telemetry data and thedetermined differences, an adjustment to the assignment of the workloadsto increase an achievement of at least one of the resource allocationobjectives without decreasing the achievement of any of the otherresource allocation objectives; and apply the adjustment to theassignments of the workloads among the managed nodes as the workloadsare performed.

Example 2 includes the subject matter of Example 1, and wherein togenerate a hierarchical model comprises to generate a hierarchical modelindicative of a spatial relationship between the managed nodes.

Example 3 includes the subject matter of any of Examples 1 and 2, andwherein to generate a hierarchical model comprises to generate ahierarchical model indicative of a membership of each managed node in afunctional group.

Example 4 includes the subject matter of any of Examples 1-3, andwherein the plurality of instructions, when executed, further cause theorchestrator server to determine node-specific weights indicative of thedetermined differences in resource utilization; and wherein to determinethe adjustment to the workload assignments comprises to predict a changein resource utilization of a workload as a function of the node-specificweights associated with two different managed nodes.

Example 5 includes the subject matter of any of Examples 1-4, andwherein the plurality of instructions, when executed, further cause theorchestrator server to generate data analytics as a function of thetelemetry data; and store the data analytics in association with eachmanaged node in the hierarchical model.

Example 6 includes the subject matter of any of Examples 1-5, andwherein to generate the data analytics comprises to generate profiles ofthe workloads, wherein the profiles are indicative of an identity ofeach workload and a resource usage classification of each workload.

Example 7 includes the subject matter of any of Examples 1-6, andwherein to generate the data analytics comprises to predict futureresource utilization of the workloads.

Example 8 includes the subject matter of any of Examples 1-7, andwherein to predict future resource utilization of the workloadscomprises to identify potential contention for resources.

Example 9 includes the subject matter of any of Examples 1-8, andwherein to receive resource allocation objective data comprises toreceive two or more of power consumption objective data indicative of atarget power usage of one or more of the managed nodes, performanceobjective data indicative of a target speed at which to perform theworkloads, reliability objective data indicative of a target life cycleof one or more of the managed nodes, or thermal objective dataindicative of a target temperature of one or more of the managed nodes.

Example 10 includes the subject matter of any of Examples 1-9, andwherein to receive telemetry data from the managed nodes comprises toreceive at least one of power consumption data indicative of an amountof power consumed by each managed node, performance data indicative of aspeed at which the workloads are executed by each managed node,temperature data indicative of a temperature within each managed node,processor utilization data indicative of an amount of processor usageconsumed by each workload performed by each managed node, memoryutilization data indicative of an amount or frequency of memory use byeach workload performed by each managed node, or network utilizationdata indicative of an amount of network bandwidth used by each workloadperformed by each managed node.

Example 11 includes the subject matter of any of Examples 1-10, andwherein the plurality of instructions, when executed, further cause theorchestrator server to determine whether the assignment of the workloadsis Pareto-efficient; and wherein to determine an adjustment to theassignment of the workloads comprises to determine, in response to adetermination that the assignment of the workloads is notPareto-efficient, an adjustment to the assignment of the workloads.

Example 12 includes the subject matter of any of Examples 1-11, andwherein to determine the adjustments comprises to determine one or morenode-specific adjustments indicative of changes to an availability ofone or more resources of at least one of the managed nodes to one ormore of the workloads performed by the managed node.

Example 13 includes the subject matter of any of Examples 1-12, andwherein to determine the node-specific adjustments comprises todetermine at least one of a processor throttle adjustment, a memoryusage adjustment, a network bandwidth adjustment, or a fan speedadjustment.

Example 14 includes the subject matter of any of Examples 1-13, andwherein to apply the determined adjustments comprises to issue a requestto perform a live migration of a workload between the managed nodes.

Example 15 includes the subject matter of any of Examples 1-14, andwherein to apply the determined adjustments comprises to issue a requestto one of the managed nodes to apply one or more node-specificadjustments indicative of changes to an availability of one or moreresources of the managed node to one or more of the workloads performedby the managed node.

Example 16 includes a method for allocating resources of a set ofmanaged nodes to workloads with a hierarchical model, the methodcomprising receiving, by an orchestrator server, resource allocationobjective data indicative of multiple resource allocation objectives tobe satisfied; determining, by the orchestrator server, an initialassignment of a set of workloads among the managed nodes; receiving, bythe orchestrator server, telemetry data from the managed nodes, whereinthe telemetry data is indicative of resource utilization by the managednodes as the workloads are performed; generating, by the orchestratorserver and as a function of the telemetry data, a hierarchical modelindicative of the resource utilization of each managed node and arelationship between the managed nodes; determining, by the orchestratorserver and with the hierarchical model, differences in resourceutilization for each workload as a function of the managed node thatperformed the workload; determining, by the orchestrator server and as afunction of the telemetry data and the determined differences, anadjustment to the assignment of the workloads to increase an achievementof at least one of the resource allocation objectives without decreasingthe achievement of any of the other resource allocation objectives; andapplying, by the orchestrator server, the adjustment to the assignmentsof the workloads among the managed nodes as the workloads are performed.

Example 17 includes the subject matter of Example 16, and whereingenerating a hierarchical model comprises generating a hierarchicalmodel indicative of a spatial relationship between the managed nodes.

Example 18 includes the subject matter of any of Examples 16 and 17, andwherein generating a hierarchical model comprises generating ahierarchical model indicative of a membership of each managed node in afunctional group.

Example 19 includes the subject matter of any of Examples 16-18, andfurther including determining, by the orchestrator server, node-specificweights indicative of the determined differences in resourceutilization, wherein determining the adjustment to the workloadassignments comprises predicting a change in resource utilization of aworkload as a function of the node-specific weights associated with twodifferent managed nodes.

Example 20 includes the subject matter of any of Examples 16-19, andfurther including generating, by the orchestrator server, data analyticsas a function of the telemetry data; and storing, by the orchestratorserver, the data analytics in association with each managed node in thehierarchical model.

Example 21 includes the subject matter of any of Examples 16-20, andwherein generating the data analytics comprises generating profiles ofthe workloads, wherein the profiles are indicative of an identity ofeach workload and a resource usage classification of each workload.

Example 22 includes the subject matter of any of Examples 16-21, andwherein generating the data analytics comprises predicting futureresource utilization of the workloads.

Example 23 includes the subject matter of any of Examples 16-22, andwherein predicting future resource utilization of the workloadscomprises identifying potential contention for resources.

Example 24 includes the subject matter of any of Examples 16-23, andwherein receiving resource allocation objective data comprises receivingtwo or more of power consumption objective data indicative of a targetpower usage of one or more of the managed nodes, performance objectivedata indicative of a target speed at which to perform the workloads,reliability objective data indicative of a target life cycle of one ormore of the managed nodes, or thermal objective data indicative of atarget temperature of one or more of the managed nodes.

Example 25 includes the subject matter of any of Examples 16-24, andwherein receiving telemetry data from the managed nodes comprisesreceiving at least one of power consumption data indicative of an amountof power consumed by each managed node, performance data indicative of aspeed at which the workloads are executed by each managed node,temperature data indicative of a temperature within each managed node,processor utilization data indicative of an amount of processor usageconsumed by each workload performed by each managed node, memoryutilization data indicative of an amount or frequency of memory use byeach workload performed by each managed node, or network utilizationdata indicative of an amount of network bandwidth used by each workloadperformed by each managed node.

Example 26 includes the subject matter of any of Examples 16-25, andfurther including determining, by the orchestrator server, whether theassignment of the workloads is Pareto-efficient, wherein determining anadjustment to the assignment of the workloads comprises determining, inresponse to a determination that the assignment of the workloads is notPareto-efficient, an adjustment to the assignment of the workloads.

Example 27 includes the subject matter of any of Examples 16-26, andwherein determining the adjustments comprises determining one or morenode-specific adjustments indicative of changes to an availability ofone or more resources of at least one of the managed nodes to one ormore of the workloads performed by the managed node.

Example 28 includes the subject matter of any of Examples 16-27, andwherein determining the node-specific adjustments comprises determiningat least one of a processor throttle adjustment, a memory usageadjustment, a network bandwidth adjustment, or a fan speed adjustment.

Example 29 includes the subject matter of any of Examples 16-28, andwherein applying the determined adjustments comprises issuing a requestto perform a live migration of a workload between the managed nodes.

Example 30 includes the subject matter of any of Examples 16-29, andwherein applying the determined adjustments comprises issuing a requestto one of the managed nodes to apply one or more node-specificadjustments indicative of changes to an availability of one or moreresources of the managed node to one or more of the workloads performedby the managed node.

Example 31 includes one or more machine-readable storage mediacomprising a plurality of instructions stored thereon that in responseto being executed, cause an orchestrator server to perform the method ofany of Examples 16-30.

Example 32 includes an orchestrator server to allocate resources of aset of managed nodes to workloads with a hierarchical model, theorchestrator server comprising one or more processors; one or morememory devices having stored therein a plurality of instructions that,when executed by the one or more processors, cause the orchestratorserver to perform the method of any of Examples 16-30.

Example 33 includes an orchestrator server to allocate resources of aset of managed nodes to workloads with a hierarchical model, theorchestrator server comprising means for performing the method of any ofExamples 16-30.

Example 34 includes an orchestrator server to allocate resources of aset of managed nodes to workloads with a hierarchical model, theorchestrator server comprising resource manager circuitry to receiveresource allocation objective data indicative of multiple resourceallocation objectives to be satisfied and determine an initialassignment of a set of workloads among the managed nodes; and telemetrymonitor circuitry to receive telemetry data from the managed nodes,wherein the telemetry data is indicative of resource utilization by themanaged nodes as the workloads are performed; wherein the resourcemanager circuitry is further to generate, as a function of the telemetrydata, a hierarchical model indicative of the resource utilization ofeach managed node and a relationship between the managed nodes,determine, with the hierarchical model, differences in resourceutilization for each workload as a function of the managed node thatperformed the workload, determine, as a function of the telemetry dataand the determined differences, an adjustment to the assignment of theworkloads to increase an achievement of at least one of the resourceallocation objectives without decreasing the achievement of any of theother resource allocation objectives, and apply the adjustment to theassignments of the workloads among the managed nodes as the workloadsare performed.

Example 35 includes the subject matter of Example 34, and wherein togenerate a hierarchical model comprises to generate a hierarchical modelindicative of a spatial relationship between the managed nodes.

Example 36 includes the subject matter of any of Examples 34 and 35, andwherein to generate a hierarchical model comprises to generate ahierarchical model indicative of a membership of each managed node in afunctional group.

Example 37 includes the subject matter of any of Examples 34-36, andwherein the resource manager circuitry is further to determinenode-specific weights indicative of the determined differences inresource utilization; and wherein to determine the adjustment to theworkload assignments comprises to predict a change in resourceutilization of a workload as a function of the node-specific weightsassociated with two different managed nodes.

Example 38 includes the subject matter of any of Examples 34-37, andwherein the resource manager circuitry is further to generate dataanalytics as a function of the telemetry data; and store the dataanalytics in association with each managed node in the hierarchicalmodel.

Example 39 includes the subject matter of any of Examples 34-38, andwherein to generate the data analytics comprises to generate profiles ofthe workloads, wherein the profiles are indicative of an identity ofeach workload and a resource usage classification of each workload.

Example 40 includes the subject matter of any of Examples 34-39, andwherein to generate the data analytics comprises to predict futureresource utilization of the workloads.

Example 41 includes the subject matter of any of Examples 34-40, andwherein to predict future resource utilization of the workloadscomprises to identify potential contention for resources.

Example 42 includes the subject matter of any of Examples 34-41, andwherein to receive resource allocation objective data comprises toreceive two or more of power consumption objective data indicative of atarget power usage of one or more of the managed nodes, performanceobjective data indicative of a target speed at which to perform theworkloads, reliability objective data indicative of a target life cycleof one or more of the managed nodes, or thermal objective dataindicative of a target temperature of one or more of the managed nodes.

Example 43 includes the subject matter of any of Examples 34-42, andwherein to receive telemetry data from the managed nodes comprises toreceive at least one of power consumption data indicative of an amountof power consumed by each managed node, performance data indicative of aspeed at which the workloads are executed by each managed node,temperature data indicative of a temperature within each managed node,processor utilization data indicative of an amount of processor usageconsumed by each workload performed by each managed node, memoryutilization data indicative of an amount or frequency of memory use byeach workload performed by each managed node, or network utilizationdata indicative of an amount of network bandwidth used by each workloadperformed by each managed node.

Example 44 includes the subject matter of any of Examples 34-43, andwherein the resource manager circuitry is further to determine whetherthe assignment of the workloads is Pareto-efficient; and wherein todetermine an adjustment to the assignment of the workloads comprises todetermine, in response to a determination that the assignment of theworkloads is not Pareto-efficient, an adjustment to the assignment ofthe workloads.

Example 45 includes the subject matter of any of Examples 34-44, andwherein to determine the adjustments comprises to determine one or morenode-specific adjustments indicative of changes to an availability ofone or more resources of at least one of the managed nodes to one ormore of the workloads performed by the managed node.

Example 46 includes the subject matter of any of Examples 34-45, andwherein to determine the node-specific adjustments comprises todetermine at least one of a processor throttle adjustment, a memoryusage adjustment, a network bandwidth adjustment, or a fan speedadjustment.

Example 47 includes the subject matter of any of Examples 34-46, andwherein to apply the determined adjustments comprises to issue a requestto perform a live migration of a workload between the managed nodes.

Example 48 includes the subject matter of any of Examples 34-47, andwherein to apply the determined adjustments comprises to issue a requestto one of the managed nodes to apply one or more node-specificadjustments indicative of changes to an availability of one or moreresources of the managed node to one or more of the workloads performedby the managed node.

Example 49 includes an orchestrator server to allocate resources of aset of managed nodes to workloads with a hierarchical model, theorchestrator server comprising circuitry for receiving resourceallocation objective data indicative of multiple resource allocationobjectives to be satisfied; circuitry for determining an initialassignment of a set of workloads among the managed nodes; circuitry forreceiving telemetry data from the managed nodes, wherein the telemetrydata is indicative of resource utilization by the managed nodes as theworkloads are performed; means for generating, as a function of thetelemetry data, a hierarchical model indicative of the resourceutilization of each managed node and a relationship between the managednodes; means for determining, with the hierarchical model, differencesin resource utilization for each workload as a function of the managednode that performed the workload; means for determining, as a functionof the telemetry data and the determined differences, an adjustment tothe assignment of the workloads to increase an achievement of at leastone of the resource allocation objectives without decreasing theachievement of any of the other resource allocation objectives; andmeans for applying the adjustment to the assignments of the workloadsamong the managed nodes as the workloads are performed.

Example 50 includes the subject matter of Example 49, and wherein themeans for generating a hierarchical model comprises means for generatinga hierarchical model indicative of a spatial relationship between themanaged nodes.

Example 51 includes the subject matter of any of Examples 49 and 50, andwherein the means for generating a hierarchical model comprises meansfor generating a hierarchical model indicative of a membership of eachmanaged node in a functional group.

Example 52 includes the subject matter of any of Examples 49-51, andfurther including means for determining node-specific weights indicativeof the determined differences in resource utilization, wherein the meansfor determining the adjustment to the workload assignments comprisesmeans for predicting a change in resource utilization of a workload as afunction of the node-specific weights associated with two differentmanaged nodes.

Example 53 includes the subject matter of any of Examples 49-52, andfurther including means for generating data analytics as a function ofthe telemetry data; and means for storing the data analytics inassociation with each managed node in the hierarchical model.

Example 54 includes the subject matter of any of Examples 49-53, andwherein the means for generating the data analytics comprises means forgenerating profiles of the workloads, wherein the profiles areindicative of an identity of each workload and a resource usageclassification of each workload.

Example 55 includes the subject matter of any of Examples 49-54, andwherein the means for generating the data analytics comprises means forpredicting future resource utilization of the workloads.

Example 56 includes the subject matter of any of Examples 49-55, andwherein the means for predicting future resource utilization of theworkloads comprises means for identifying potential contention forresources.

Example 57 includes the subject matter of any of Examples 49-56, andwherein the circuitry for receiving resource allocation objective datacomprises circuitry for receiving two or more of power consumptionobjective data indicative of a target power usage of one or more of themanaged nodes, performance objective data indicative of a target speedat which to perform the workloads, reliability objective data indicativeof a target life cycle of one or more of the managed nodes, or thermalobjective data indicative of a target temperature of one or more of themanaged nodes.

Example 58 includes the subject matter of any of Examples 49-57, andwherein the circuitry for receiving telemetry data from the managednodes comprises circuitry for receiving at least one of powerconsumption data indicative of an amount of power consumed by eachmanaged node, performance data indicative of a speed at which theworkloads are executed by each managed node, temperature data indicativeof a temperature within each managed node, processor utilization dataindicative of an amount of processor usage consumed by each workloadperformed by each managed node, memory utilization data indicative of anamount or frequency of memory use by each workload performed by eachmanaged node, or network utilization data indicative of an amount ofnetwork bandwidth used by each workload performed by each managed node.

Example 59 includes the subject matter of any of Examples 49-58, andfurther including means for determining whether the assignment of theworkloads is Pareto-efficient, wherein the means for determining anadjustment to the assignment of the workloads comprises means fordetermining, in response to a determination that the assignment of theworkloads is not Pareto-efficient, an adjustment to the assignment ofthe workloads.

Example 60 includes the subject matter of any of Examples 49-59, andwherein the means for determining the adjustments comprises means fordetermining one or more node-specific adjustments indicative of changesto an availability of one or more resources of at least one of themanaged nodes to one or more of the workloads performed by the managednode.

Example 61 includes the subject matter of any of Examples 49-60, andwherein the means for determining the node-specific adjustmentscomprises means for determining at least one of a processor throttleadjustment, a memory usage adjustment, a network bandwidth adjustment,or a fan speed adjustment.

Example 62 includes the subject matter of any of Examples 49-61, andwherein the means for applying the determined adjustments comprisesmeans for issuing a request to perform a live migration of a workloadbetween the managed nodes.

Example 63 includes the subject matter of any of Examples 49-62, andwherein the means for applying the determined adjustments comprisesmeans for issuing a request to one of the managed nodes to apply one ormore node-specific adjustments indicative of changes to an availabilityof one or more resources of the managed node to one or more of theworkloads performed by the managed node.

1. An orchestrator server to allocate resources of a set of managednodes to workloads with a hierarchical model, the orchestrator servercomprising: one or more processors; one or more memory devices havingstored therein a plurality of instructions that, when executed by theone or more processors, cause the orchestrator server to: receiveresource allocation objective data indicative of multiple resourceallocation objectives to be satisfied; determine an initial assignmentof a set of workloads among the managed nodes; receive telemetry datafrom the managed nodes, wherein the telemetry data is indicative ofresource utilization by the managed nodes as the workloads areperformed; generate, as a function of the telemetry data, a hierarchicalmodel indicative of the resource utilization of each managed node and arelationship between the managed nodes; determine, with the hierarchicalmodel, differences in resource utilization for each workload as afunction of the managed node that performed the workload; determine, asa function of the telemetry data and the determined differences, anadjustment to the assignment of the workloads to increase an achievementof at least one of the resource allocation objectives without decreasingthe achievement of any of the other resource allocation objectives; andapply the adjustment to the assignments of the workloads among themanaged nodes as the workloads are performed.
 2. The orchestrator serverof claim 1, wherein to generate a hierarchical model comprises togenerate a hierarchical model indicative of a spatial relationshipbetween the managed nodes.
 3. The orchestrator server of claim 1,wherein to generate a hierarchical model comprises to generate ahierarchical model indicative of a membership of each managed node in afunctional group.
 4. The orchestrator server of claim 1, wherein theplurality of instructions, when executed, further cause the orchestratorserver to: determine node-specific weights indicative of the determineddifferences in resource utilization; and wherein to determine theadjustment to the workload assignments comprises to predict a change inresource utilization of a workload as a function of the node-specificweights associated with two different managed nodes.
 5. The orchestratorserver of claim 1, wherein the plurality of instructions, when executed,further cause the orchestrator server to: generate data analytics as afunction of the telemetry data; and store the data analytics inassociation with each managed node in the hierarchical model.
 6. Theorchestrator server of claim 5, wherein to generate the data analyticscomprises to generate profiles of the workloads, wherein the profilesare indicative of an identity of each workload and a resource usageclassification of each workload.
 7. The orchestrator server of claim 5,wherein to generate the data analytics comprises to predict futureresource utilization of the workloads.
 8. The orchestrator server ofclaim 7, wherein to predict future resource utilization of the workloadscomprises to identify potential contention for resources.
 9. Theorchestrator server of claim 1, wherein to receive resource allocationobjective data comprises to receive two or more of power consumptionobjective data indicative of a target power usage of one or more of themanaged nodes, performance objective data indicative of a target speedat which to perform the workloads, reliability objective data indicativeof a target life cycle of one or more of the managed nodes, or thermalobjective data indicative of a target temperature of one or more of themanaged nodes.
 10. The orchestrator server of claim 1, wherein toreceive telemetry data from the managed nodes comprises to receive atleast one of power consumption data indicative of an amount of powerconsumed by each managed node, performance data indicative of a speed atwhich the workloads are executed by each managed node, temperature dataindicative of a temperature within each managed node, processorutilization data indicative of an amount of processor usage consumed byeach workload performed by each managed node, memory utilization dataindicative of an amount or frequency of memory use by each workloadperformed by each managed node, or network utilization data indicativeof an amount of network bandwidth used by each workload performed byeach managed node.
 11. The orchestrator server of claim 1, wherein theplurality of instructions, when executed, further cause the orchestratorserver to determine whether the assignment of the workloads isPareto-efficient; and wherein to determine an adjustment to theassignment of the workloads comprises to determine, in response to adetermination that the assignment of the workloads is notPareto-efficient, an adjustment to the assignment of the workloads. 12.The orchestrator server of claim 1, wherein to determine the adjustmentscomprises to determine one or more node-specific adjustments indicativeof changes to an availability of one or more resources of at least oneof the managed nodes to one or more of the workloads performed by themanaged node.
 13. One or more machine-readable storage media comprisinga plurality of instructions stored thereon that, in response to beingexecuted, cause an orchestrator server to: receive resource allocationobjective data indicative of multiple resource allocation objectives tobe satisfied; determine an initial assignment of a set of workloadsamong the managed nodes; receive telemetry data from the managed nodes,wherein the telemetry data is indicative of resource utilization by themanaged nodes as the workloads are performed; generate, as a function ofthe telemetry data, a hierarchical model indicative of the resourceutilization of each managed node and a relationship between the managednodes; determine, with the hierarchical model, differences in resourceutilization for each workload as a function of the managed node thatperformed the workload; determine, as a function of the telemetry dataand the determined differences, an adjustment to the assignment of theworkloads to increase an achievement of at least one of the resourceallocation objectives without decreasing the achievement of any of theother resource allocation objectives; and apply the adjustment to theassignments of the workloads among the managed nodes as the workloadsare performed.
 14. The one or more machine-readable storage media ofclaim 13, wherein to generate a hierarchical model comprises to generatea hierarchical model indicative of a spatial relationship between themanaged nodes.
 15. The one or more machine-readable storage media ofclaim 13, wherein to generate a hierarchical model comprises to generatea hierarchical model indicative of a membership of each managed node ina functional group.
 16. The one or more machine-readable storage mediaof claim 13, wherein the plurality of instructions, when executed,further cause the orchestrator server to: determine node-specificweights indicative of the determined differences in resourceutilization; and wherein to determine the adjustment to the workloadassignments comprises to predict a change in resource utilization of aworkload as a function of the node-specific weights associated with twodifferent managed nodes.
 17. The one or more machine-readable storagemedia of claim 13, wherein the plurality of instructions, when executed,further cause the orchestrator server to: generate data analytics as afunction of the telemetry data; and store the data analytics inassociation with each managed node in the hierarchical model.
 18. Theone or more machine-readable storage media of claim 17, wherein togenerate the data analytics comprises to generate profiles of theworkloads, wherein the profiles are indicative of an identity of eachworkload and a resource usage classification of each workload.
 19. Theone or more machine-readable storage media of claim 17, wherein togenerate the data analytics comprises to predict future resourceutilization of the workloads.
 20. The one or more machine-readablestorage media of claim 19, wherein to predict future resourceutilization of the workloads comprises to identify potential contentionfor resources.
 21. The one or more machine-readable storage media ofclaim 13, wherein to receive resource allocation objective datacomprises to receive two or more of power consumption objective dataindicative of a target power usage of one or more of the managed nodes,performance objective data indicative of a target speed at which toperform the workloads, reliability objective data indicative of a targetlife cycle of one or more of the managed nodes, or thermal objectivedata indicative of a target temperature of one or more of the managednodes.
 22. The one or more machine-readable storage media of claim 13,wherein to receive telemetry data from the managed nodes comprises toreceive at least one of power consumption data indicative of an amountof power consumed by each managed node, performance data indicative of aspeed at which the workloads are executed by each managed node,temperature data indicative of a temperature within each managed node,processor utilization data indicative of an amount of processor usageconsumed by each workload performed by each managed node, memoryutilization data indicative of an amount or frequency of memory use byeach workload performed by each managed node, or network utilizationdata indicative of an amount of network bandwidth used by each workloadperformed by each managed node.
 23. The one or more machine-readablestorage media of claim 13, wherein the plurality of instructions, whenexecuted, further cause the orchestrator server to determine whether theassignment of the workloads is Pareto-efficient; and wherein todetermine an adjustment to the assignment of the workloads comprises todetermine, in response to a determination that the assignment of theworkloads is not Pareto-efficient, an adjustment to the assignment ofthe workloads.
 24. The one or more machine-readable storage media ofclaim 13, wherein to determine the adjustments comprises to determineone or more node-specific adjustments indicative of changes to anavailability of one or more resources of at least one of the managednodes to one or more of the workloads performed by the managed node. 25.An orchestrator server to allocate resources of a set of managed nodesto workloads based on resource utilization phase residencies, theorchestrator server comprising: circuitry for receiving resourceallocation objective data indicative of multiple resource allocationobjectives to be satisfied; circuitry for determining an initialassignment of a set of workloads among the managed nodes; circuitry forreceiving telemetry data from the managed nodes, wherein the telemetrydata is indicative of resource utilization by each of the managed nodesas the workloads are performed; means for determining, as a function ofthe telemetry data, phase residency data indicative of temporal lengthsof resource utilization phases of the workloads, wherein each resourceutilization phase is indicative of a utilization of a managed nodecomponent that satisfies a threshold amount; means for determining, as afunction of at least the phase residency data and the resourceallocation objective data, an adjustment to the assignment of theworkloads to increase an achievement of at least one of the resourceallocation objectives without decreasing the achievement of any of theother resource allocation objectives; and means for applying theadjustment to the assignments of the workloads among the managed nodesas the workloads are performed.
 26. A method for allocating resources ofa set of managed nodes to workloads based on resource utilization phaseresidencies, the method comprising: receiving, by an orchestratorserver, resource allocation objective data indicative of multipleresource allocation objectives to be satisfied; determining, by theorchestrator server, an initial assignment of a set of workloads amongthe managed nodes; receiving, by the orchestrator server, telemetry datafrom the managed nodes, wherein the telemetry data is indicative ofresource utilization by each of the managed nodes as the workloads areperformed; determining, by the orchestrator server and as a function ofthe telemetry data, phase residency data indicative of temporal lengthsof resource utilization phases of the workloads, wherein each resourceutilization phase is indicative of a utilization of a managed nodecomponent that satisfies a threshold amount; determining, by theorchestrator server and as a function of at least the phase residencydata and the resource allocation objective data, an adjustment to theassignment of the workloads to increase an achievement of at least oneof the resource allocation objectives without decreasing the achievementof any of the other resource allocation objectives; and applying, by theorchestrator server, the adjustment to the assignments of the workloadsamong the managed nodes as the workloads are performed.
 27. The methodof claim 26, wherein determining the adjustment to the assignment of theworkloads comprises: identifying complementary workload utilizationphases indicative of resource utilization phases of different managednode components by two or more workloads; and determining an alignmentof the complementary workload utilization phases to cause thecomplementary workload utilization phases to be performed concurrentlyby the same managed node.
 28. The method of claim 27, wherein:determining the alignment of complementary workload phases comprisesdetermining a time offset for execution of one or more of the workloads;and applying the adjustments to the assignments comprises: temporarilysuspending execution of the one or more of the workloads; and resumingexecution of the one or more of the workloads after the time offset haselapsed.